Sublinear Projective Clustering with Outliers

نویسندگان

  • Nina Mishra
  • Rajeev Motwani
  • Sergei Vassilvitskii
چکیده

Given a set of n points in <d, a family of shapes S and a number of clusters k, the projective clustering problem is to find a collection of k shapes in S such that the maximum distance from a point to its nearest shape is minimized. Some special cases of the problem include the k-line center problem where the goal is to cover the points with minimum radius hypercylinders and the k-hyperplane center problem where the goal is to cover the points with minimum width slabs. In practice, projective clustering algorithms are often used as a dimension reduction technique to enable more effective data representation for indexing and data mining purposes on massively large datasets (See, for example, [8, 9]). In typical applications the number of points n is extremely large, the dimensionality d is large, the data possesses some outliers, while the number of clusters, k is small. Consequently, the emphasis of this paper will be on the running times of the algorithms. We present for the first time sublinear time randomized algorithms for the k-line and hyperplane center problems, where the running times of our algorithms are independent of n.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Projective Clustering Method for the Detection of Outliers in Non-Axis Aligned Subspaces

Clustering the case of non-axis-aligned subspaces and detection of outliers is a major challenge due to the curse of dimensionality. The normal clustering was efficient in axis-aligned subspaces only. To solve this problem, projective clustering has been defined as an extension to traditional clustering that attempts to find projected clusters in subsets of the dimensions of a data space. A pro...

متن کامل

Linear Time Algorithm for Projective Clustering

Projective clustering is a problem with both theoretical and practical importance and has received a great deal of attentions in recent years. Given a set of points P in R space, projective clustering is to find a set F of k lower dimensional j-flats so that the average distance (or squared distance) from points in P to their closest flats is minimized. Existing approaches for this problem are ...

متن کامل

Assessment of the Performance of Clustering Algorithms in the Extraction of Similar Trajectories

In recent years, the tremendous and increasing growth of spatial trajectory data and the necessity of processing and extraction of useful information and meaningful patterns have led to the fact that many researchers have been attracted to the field of spatio-temporal trajectory clustering. The process and analysis of these trajectories have resulted in the extraction of useful information whic...

متن کامل

A robust wavelet based profile monitoring and change point detection using S-estimator and clustering

Some quality characteristics are well defined when treated as response variables and are related to some independent variables. This relationship is called a profile. Parametric models, such as linear models, may be used to model profiles. However, in practical applications due to the complexity of many processes it is not usually possible to model a process using parametric models.In these cas...

متن کامل

Clustering cancer gene expression data by projective clustering ensemble

Gene expression data analysis has paramount implications for gene treatments, cancer diagnosis and other domains. Clustering is an important and promising tool to analyze gene expression data. Gene expression data is often characterized by a large amount of genes but with limited samples, thus various projective clustering techniques and ensemble techniques have been suggested to combat with th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005